Introduction to R

Marine Ecosystem Dynamics

Author

Kinlan Jan, Baptiste Serandour

Plan for today’s lecture

  • The R syntax
  • The R studio software
  • Variables, functions and vectors
  • Importing data using the readr package

Why using R?

Pro

  • Free
  • Open source
  • Reproducible science
# You can keep track of all the data analysis steps
2 + 2 + 3       # step 1
#> [1] 7
log(2 + 2 + 3)  # step 2
#> [1] 1.94591

Cons

  • Scary
  • Syntax
# This can be scary
library(ggplot2) ; library(dplyr) ; set.seed(123)
tibble(Month = sample(month.abb, 100, replace = TRUE),
       Genus = sample(c("Acartia", "Temora", "Centropages", "Pseudocalanus"), 100, replace = T),
       Abundance = rnorm(100,12,7)) |> 
  group_by(Month, Genus) |> 
  summarise(Avg_abundance = mean(Abundance, na.rm = T)) |> 
 
  ggplot(aes(x = Genus, y = Avg_abundance)) +
    geom_boxplot()

R is open and free

This means that people have worked on it and created tools and functions that everyone can use !

  • R base functions (already implemented and loaded when starting a new session): e.g., plot(), +, -, sin()
  • Additional functions (we need to load): e.g. ggplot(), select(), …

How to install and load packages

  • A package need to be installed only once
  • To use functions within a package call it using library()
install.packages("PackageName")
library(PackageName)

R syntax

R as a calculator

  • R can resolve “basic” operation
2 + 2
#> [1] 4
3 * 4
#> [1] 12
(5 + 2) * (4 - 1)
#> [1] 21
  • And more complex operation
sin(60)
#> [1] -0.3048106
log(10)
#> [1] 2.302585

Variables

Variables in R can be of several types :

  • Logical: TRUE or FALSE
  • Numeric: 3.1 or 4
  • Character: Example
variable_1 <- 4.3
variable_2 <- c(1, 2, 3)
variable_3 <- "text"

To assign a value to a variable, several options exist

  • <- e.g. a <- 2
  • -> e.g. 2 -> a
  • assign() e.g. assign("a", 2)
  • = e.g. a = 2

. . .

Assigning the same value for multiple variable

variable_4 <- variable_5 <- variable_6 <- "Value"

Functions

  • All functions have the same structure but the number of argument may change function_name(argument1, ...)
log(10)
plot(x, y)
  • To know what arguments are needed, we can always refer to the manuals using ? before the function
?plot()

If you want to go a step further

  • You can define your own functions:
my_addition <- function(parameter_1, parameter_2, ...){
  parameter_1 + parameter_2
}

. . .

  • And compare if this is equal to the base R functions:
my_addition(parameter_1 = 1, parameter_2 = 2) == 1 + 2
#> [1] TRUE
  • Note the logical operations are written as follow:
    • is equal: ==
    • is different: !=

Vectors

  • Vectors can be created using different functions
(vector_1 <- c(1, 3, 6))
#> [1] 1 3 6
(vector_2 <- seq(from = 2, to = 10, by = 3))
#> [1] 2 5 8
(vector_3 <- rep("Yellow", 3))
#> [1] "Yellow" "Yellow" "Yellow"
(vector_4 <- c(vector_1, vector_2))
#> [1] 1 3 6 2 5 8

. . .

  • R works with vectors from which we can do our calculation
vector_1 * 2
#> [1]  2  6 12
mean(vector_4)
#> [1] 4.166667
class(vector_3)
#> [1] "character"

Importing data

  • The best and most efficient way to import data is to use the readr packages
  • The main function has this form: read_* where * can be:
    • csv - comma-separated values
    • tsv - tab-separated values
    • csv2 - semicolon-separated values with , as the decimal mark
    • delim - delimited files

Example

library(readr)
#> Warning: package 'readr' was built under R version 4.1.2

Example_1 <- readr::read_csv("./../../assets/data/Example_1.csv") 
#> Rows: 100 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): Month, Genus
#> dbl (1): Abundance
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Example_1)
#> # A tibble: 6 × 3
#>   Month Genus         Abundance
#>   <chr> <chr>             <dbl>
#> 1 Dec   Centropages      -0.552
#> 2 Apr   Centropages      12.5  
#> 3 Feb   Centropages      18.4  
#> 4 Sep   Acartia          25.6  
#> 5 Mar   Pseudocalanus     9.70 
#> 6 Jul   Temora            8.90

. . .

tail(Example_1)
#> # A tibble: 6 × 3
#>   Month Genus         Abundance
#>   <chr> <chr>             <dbl>
#> 1 Jan   Pseudocalanus     22.7 
#> 2 Feb   Acartia           27.6 
#> 3 Aug   Acartia            7.75
#> 4 Jan   Centropages       17.0 
#> 5 Feb   Centropages        5.95
#> 6 Aug   Temora            17.2

Plan for tomorrow

  • Introduction to tidyverse
  • Pipe the data using magrittr
  • Clean the data using tidyr
  • Arrange the data using dplyr
  • Plot using ggplot2

Do not hesitate to use google to get help !

If you have an issue with something, you are probably not the first and someone asked a solution on a forum !